13 research outputs found

    Efficient Nearest Neighbors Search for Large-Scale Landmark Recognition

    Full text link
    The problem of landmark recognition has achieved excellent results in small-scale datasets. When dealing with large-scale retrieval, issues that were irrelevant with small amount of data, quickly become fundamental for an efficient retrieval phase. In particular, computational time needs to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. In this paper we propose a novel multi-index hashing method called Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows to drastically reduce the query time and outperforms the accuracy results compared to the state-of-the-art methods for large-scale landmark recognition. It has been demonstrated that this family of algorithms can be applied on different embedding techniques like VLAD and R-MAC obtaining excellent results in very short times on different public datasets: Holidays+Flickr1M, Oxford105k and Paris106k

    An Energy Saving Road Sweeper Using Deep Vision for Garbage Detection

    Get PDF
    Road sweepers are ubiquitous machines that help preserve our cities cleanliness and health by collecting road garbage and sweeping out dirt from our streets and sidewalks. They are often very mechanical instruments, needing to operate in harsh conditions dealing with all sorts of abandoned trash and natural garbage. They are usually composed of rotating brushes, collector belts and bins, and sometimes water or air streams. All of these mechanical tools are usually high in power demand and strongly subject to wear and tear. Moreover, due to the simple working logic often implied by these cleaning machines, these tools work in an “always on”/“max power” state, and any further regulation is left to the pilot. Therefore, adding artificial intelligence able to correctly operate these tools in a semi-automatic way would be greatly beneficial. In this paper, we propose an automatic road garbage detection system, able to locate with great precision most types of road waste, and to correctly instruct a road sweeper in order to handle them. With this simple addition to an existing sweeper, we will be able to save more than 80% electrical power currently absorbed by the cleaning systems and reduce by the same amount brush weariness (prolonging their lifetime). This is done by choosing when to use the brushes and when not to, with how much strength, and where. The only hardware components needed by the system will be a camera and a PC board able to read the camera output (and communicate via CanBus). The software of the system will be mainly composed of a deep neural network for semantic segmentation of images, and a real-time software program to control the sweeper actuators with the appropriate timings. To prove the claimed results, we run extensive tests onboard of such a truck, as well as benchmark tests for accuracy, sensitivity, specificity and inference speed of the system

    Automatic Generation of Semantic Parts for Face Image Synthesis

    Full text link
    Semantic image synthesis (SIS) refers to the problem of generating realistic imagery given a semantic segmentation mask that defines the spatial layout of object classes. Most of the approaches in the literature, other than the quality of the generated images, put effort in finding solutions to increase the generation diversity in terms of style i.e. texture. However, they all neglect a different feature, which is the possibility of manipulating the layout provided by the mask. Currently, the only way to do so is manually by means of graphical users interfaces. In this paper, we describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks, with specific focus on human faces. Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited. Then, a bi-directional LSTM block and a convolutional decoder output a new, locally manipulated mask. We report quantitative and qualitative results on the CelebMask-HQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level. Also, we show our model can be put before a SIS generator, opening the way to a fully automatic generation control of both shape and texture. Code available at https://github.com/TFonta/Semantic-VAE.Comment: Preprint, accepted for publication at ICIAP 202

    Efficient Nearest Neighbors Search for Large-Scale Landmark Recognition

    No full text
    The problem of landmark recognition has achieved excellent results in small-scale datasets. Instead, when dealing with large-scale retrieval, issues that were irrelevant with small amount of data, quickly become fundamental for an efficient retrieval phase. In particular, computational time needs to be kept as low as possible, whilst the retrieval accuracy has to be preserved as much as possible. In this paper we propose a novel multi-index hashing method called Bag of Indexes (BoI) for Approximate Nearest Neighbors (ANN) search. It allows to drastically reduce the query time and outperforms the accuracy results compared to the state-of-the-art methods for large-scale landmark recognition. It has been demonstrated that this family of algorithms can be applied on different embedding techniques like VLAD and R-MAC obtaining excellent results in very short times on different public datasets: Holidays+Flickr1M, Oxford105k and Paris106k

    Landmark Recognition: From Small-Scale to Large-Scale Retrieval

    No full text
    During the last years, the problem of landmark recognition is addressed in many different ways. Landmark recognition is related to finding the most similar images to a starting one in a particular dataset of buildings or places. This chapter explains the most used techniques for solving the problem of landmark recognition, with a specific focus on techniques based on deep learning. Firstly, the focus is on the classical approaches for the creation of descriptors used in the content-based image retrieval task. Secondly, the deep learning approach that has shown overwhelming improvements in many tasks of computer vision, is presented. A particular attention is put on the major recent breakthroughs in Content-Based Image Retrieval (CBIR), the first one is transfer learning which improves the feature representation and therefore accuracy of the retrieval system. The second one is the fine-tuning technique, that allows to highly improve the performance of the retrieval system, is presented. Finally, the chapter exposes the techniques for large-scale retrieval, in which datasets contain at least a million images

    Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of Variation

    No full text
    Learning a disentangled representation of a distribution in a completely unsupervised way is a challenging task that has drawn attention recently. In particular, much focus has been put in separating factors of variation (i.e., attributes) within the latent code of a Generative Adversarial Network (GAN). Achieving that permits control of the presence or absence of those factors in the generated samples by simply editing a small portion of the latent code. Nevertheless, existing methods that perform very well in a noise-to-image setting often fail when dealing with a real data distribution, i.e., when the discovered attributes need to be applied to real images. However, some methods are able to extract and apply a style to a sample but struggle to maintain its content and identity, while others are not able to locally apply attributes and end up achieving only a global manipulation of the original image. In this article, we propose a completely (i.e., truly) unsupervised method that is able to extract a disentangled set of attributes from a data distribution and apply them to new samples from the same distribution by preserving their content. This is achieved by using an image-to-image GAN that maps an image and a random set of continuous attributes to a new image that includes those attributes. Indeed, these attributes are initially unknown and they are discovered during training by maximizing the mutual information between the generated samples and the attributes’ vector. Finally, the obtained disentangled set of continuous attributes can be used to freely manipulate the input samples. We prove the effectiveness of our method over a series of datasets and show its application on various tasks, such as attribute editing, data augmentation, and style transfer
    corecore